Elastic Replication for Scalable Consistent Services
نویسندگان
چکیده
Most of the scalable and high-performance services used in datacenters today provide relaxed consistency guarantees in order to achieve good responsiveness. One reason for this is that it is believed that expensive majority-based consensus protocols are needed in order to provide strong consistency in asynchronous and partially synchronous environments such as a datacenter or the Internet. In this extended abstract, we briefly describe our research into building a new lightweight replication protocol that does not use majority voting and yet provides strong consistency in the presence of crash faults and imperfect failure detectors. 1 Motivation and related work Systems are replicated in order to improve availability (by having multiple independently failing copies of a system, the failure of some subset of the copies can be tolerated), and performance (multiple copies can divide load among them). However, with replication comes the question of consistency. A strongly consistent replicated system behaves, externally, identical to its unreplicated counterpart. But making a replicated system strongly consistent can compromise both availability and performance as the replicas need to coordinate operations. Consequently, many replicated systems have embraced relaxed consistency in which the replicated system sometimes behaves differently from the unreplicated counterpart. Cloud computing services are often built using replication protcols such as Primary-Backup [1], or Quorum Intersection [3]. These services often rely on a centralized configuration manager (CCM) [2]. The CCM itself is a system that is replicated using a strongly consistent state machine replication protocol such as Paxos [4]. In the case of Primary-Backup, a primary replica receives all updates, orders them, and forwards them in FIFO order to the non-faulty backup replicas. Clients can read any of the non-faulty replicas. In case of a primary failure, one of the backups becomes the new primary. This is coordinated using the CCM. Quorum Intersection protocols are useful for put/get-type systems such as Key-Value Stores. A put operation, accompanied by a timestamp, is sent to a “put-quorum,” while a get operation reads from a “get-quorum.” By guaranteeing that any putquorum and get-quorum intersect, a get can be guaranteed to see the latest put operation. By making quorums smaller than the entire set of replicas, availability and performance are achieved. The CCM is responsible for keeping track of the replicas and the quorum sizes. Neither of these common replication approaches guarantees strong consistency in the absence of accurate failure detection (aka fail-stop failures [5]). Both can provide stale data: a client might read one replica (or quorum of replicas) to get version n of the data, and afterward another client may read from a replica (or quorum of replicas) that has not yet been updated and has only seen version n−1. But stale data is not the worst problem. In Primary-Backup, if the primary is mistakenly suspected of having failed and another backup is designate primary by the CCM, a client may read the result of some operation from the original primary that is not applied, and never will be, to the second replica. In the case of Quorum protocols, reconfiguration of the set of replicas could temporarily result in non-intersecting quorums. 2 Elastic Replication Elastic replication is a new light-weight crashtolerant replication protocol. It supports strong consistency semantics, fast reconfiguration, flexible replica placement policies, and it does not rely on a CCM or accurate failure detection.
منابع مشابه
Data Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملCATS: Linearizability and Partition Tolerance in Scalable and Self-Organizing Key-Value Stores
Distributed key-value stores provide scalable, fault-tolerant, and selforganizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic c...
متن کاملA Tool for Massively Replicating Internet Archives: Design, Implementation, and Experience
This paper reports the design, implementation, and performance of a scalable and efficient tool to replicate Internet information services. Our tool targets replication degrees of tens of thousands of weakly-consistent replicas scattered throughout the Internet’s thousands of autonomously administered domains. The main goal of our replication tool is to make existing replication algorithms scal...
متن کاملFRAPPE: Fast Replication Platform for Elastic Services
Elasticity is critical for today’s cloud services, which must be able to quickly adapt to dynamically changing load conditions and resource availability. We introduce FRAPPÉ, a new consistent replication platform aiming at improving elasticity of the replicated services hosted in clouds or large data centers. In the core of FRAPPÉ is a novel replicated state machine protocol, which employs spec...
متن کاملReducing Replication Overhead for Data Durability in DHT Based P2P System
DHT based p2p systems appear to provide scalable storage services with idle resource from many unreliable clients. If a DHT is used in storage intensive applications where data loss must be minimized, quick replication is especially important to replace lost redundancy on other nodes in reaction to failures. To achieve this easily, a simple replication method directly uses a consistent set, suc...
متن کامل